Overview of the IA-64 Architecture

نویسنده

  • Karthik Swaminathan
چکیده

Introduction:..................................................................................2 Architecture Design:........................................................................................3 1. Support for two Operating System Environments: ............................................................................. 3 2. Ability to handle IA-32 Instruction sets in the IA-64 operating environment: ................................... 3 3. Unique Instruction Format aka “Bundling”: ....................................................................................... 4 4. Instruction set features & Instruction Sequencing: ............................................................................. 4 5. Registers: ............................................................................................................................................ 6 5.1. Application State Registers and their usage:............................................................................... 6 5.2. System State Registers and their usage:...................................................................................... 8 6. Register Stack: .................................................................................................................................. 10 7. Memory and Addressing:.................................................................................................................. 11 8. Processor to Compiler interaction (& vice versa): ............................................................................ 12 9. Interruption Handling: ...................................................................................................................... 14 In Comparison to RISC/CISC processors & areas of use: ............................16 Comparison to Sun Microsystems’ SPARC:.................................................18 Conclusion: ....................................................................................................19 Bibliography: .................................................................................................20 List of Figures:...............................................................................................21 Abstract & Introduction: Introduction: For the purpose of this assignment, I have chosen to document one of the most recent developments in the microprocessor industry. The architecture known as EPIC (Explicitly Parallel Instruction Computing) as applied particularly to IA-64, Intel’s 64-bit microprocessor architecture, is going to be the object of my discussion. Throughout the course of this paper I shall be using the terms IA-64 and “Itanium” (the first processor in this series) synonymously. Existing architectures are based on an out-of-order execution model, which require increasingly complex hardware mechanisms. Performance limiters such as branches and memory latency increasingly impede these processors. Intel’s IA-64 processor architecture is designed to overcome these limitations. In addition, the IA-64 architecture provides additional performance headroom and scalability needed for future computation-intensive applications. The IA-64 architecture features a revolutionary 64bit instruction set architecture (ISA), which applies a new processor technology called EPIC. Jointly defined with Hewlett-Packard Company, EPIC embodies a set of advanced computer architecture techniques such as explicit parallelism, predication, and speculation. These techniques enable IA-64 processors to execute more instructions per clock cycle to deliver superior performance relative to today’s out-of-order based RISC processors. The EPIC technique eliminates much of the dependency checking and grouping logic that consumes an increasingly large portion of advanced RISC and x86 processors. EPIC’s flexible grouping mechanism solves VLIW based processors’ two fatal flaws: excessive code expansion and lack of scalability. This paper is organized into the following sections: • Architecture Design – Key features of the IA-64 architecture and their explanation. • Performance Comparison to RISC/CISC processors. • Comparison to Sun Microsystems’ SPARC series – This comparison is done at a high level and not at a feature level. • Conclusion. • Bibliography. • List of Figures that highlight some of the architectural details. Architecture Design: I was able to locate some information about the IA-64 architecture and its programming model. In this section I shall try to condense, what in my opinion are, the most of the critical pieces of the IA-64 architecture. Please refer to Figure 1, in the List of Figures, for a diagram that depicts the high level “Itanium” architecture and exposes the synergy between software and hardware. 1. Support for two Operating System Environments: IA-64 handles two operating system environments within a single architecture: The IA32 environment that supports 32 bit operating systems and IA-64 (native) environment that supports 64 bit operating systems. The IA-32 system environment, which includes full support for the IA-32 instruction set, can be used for IA-32 Protected Mode, Real Mode, and Virtual 8086 Mode applications and operating systems. This environment makes the IA-64 architecture backward compatible with the “Pentium” series of processors. The IA-64 system environment can be used to run IA-32 Real Mode, Protected Mode, and Virtual Mode applications if supported by the 64-bit operating system. It naturally includes support for 64 bit applications. This way this is implemented is elaborated in the next. 2. Ability to handle IA-32 Instruction sets in the IA-64 operating environment: The IA-64 operating environment allows the execution of full 32-bit binaries compiled on IA-32 systems provided the required platform and firmware support exists on the system. This operating environment also gives the application developer the ability/power to intermix older x86 instruction sets with the native instruction sets. Moreover, “Itanium” can convert x86 instructions into internal native-mode instructions before executing them. Further, it can accept these native mode instructions directly from memory thereby eliminating the inefficiencies of hardware translation that asserts the fact that the processor will be optimized for native-mode execution rather than x86 mode execution. From several technical publications, it can be gleaned that “Itanium” will allow x86 and IA-64 instructions to commingle at all levels of the memory hierarchy, including the on-chip cache, off-chip cache, and main memory. This method allows the chip to maintain a single system interface that knows how to fetch instructions into the CPU. A “mode” bit is present on the chip that will direct the instructions to either the x86 decoder or a native decoder. The presence of this bit avoids the need for encoding at the instruction level. Using this mechanism, the load can be placed as early as possible in the code, as long as the address can be computed. If the data is never checked, no exception will be triggered; if an exception occurs when the data is needed, the exception will be recognized in the load’s original “home block”. Speculative loads thus provide the compiler with maximum flexibility to hide cache latency. Three special instructions and interruptions are defined to transition the processor between the IA-32 and IA-64 instruction sets. Also refer to Figure 2, in the List of Figures, for an illustration. • jmpe (IA-32 instruction) – Jump to an IA-64 target instruction, and change the instruction set to IA-64. • br.ia (IA-64 instruction) – IA-64 branch to an IA-32 target instruction, and change the instruction set to IA-32. • rfi (IA-64 instruction) – Return from interruption. It has been defined to return either an IA-32 or IA-64 instruction when resuming from an interruption. • Interruptions transition the processor to the IA-64 instruction set for all interruption conditions. The jmpe and br.ia instructions provide a low overhead mechanism to transfer control between instruction sets. These primitives are typically incorporated into “thunks” or “stubs” that implement the required call linkage and calling conventions to call dynamically or statically linked libraries. 3. Unique Instruction Format aka “Bundling”: The IA-64 instructions use a unique format that allows the compiler to direct hardware execution without severely bloating the software. A single 128-bit aligned container called “bundle” contains three 41-bit IA-64 instructions along with the 5-bit “template” information about the “bundle”. The “template” field, in my opinion, seems to be the most interesting sequence of bits in the “bundle”. This field stores, what are referred to as, architectural stops. These stops indicate that one or more instructions before the stop may have certain kind of dependencies with one or more instructions after the stop. Thus it indicates whether the instructions in the “bundle” can be executed in parallel or if one or more instructions must be executed serially, due to register dependencies. The template also indicates whether the bundle can be executed in parallel with the following “bundle”. In addition to storing the location of stops the “template” field also specifies the mapping of instruction slots to execution unit types. “Bundles” can be chained to form Instruction Groups of any length. Figure 3, in the List of Figures, depicts an IA-64 “bundle”. “Bundles” are ordered from lowest to highest memory address. Instructions in “bundles” with lower memory addresses are considered to precede instructions in “bundles” with higher memory addresses. The byte order of each “bundle” in memory is little-endian (the template field is contained in byte 0 of a bundle). Within a “bundle” instructions are ordered from instruction slot 0 to instruction slot 2 as specified in Figure 3. 4. Instruction set features & Instruction Sequencing: There are basically 6 instruction types – ALU (A), Integer (I), Memory (M), Floatingpoint (F), Branch (B), and Long/Extended (L). All instructions in the instruction set of the “Itanium” implementation are 41 bits in length. The leftmost 4 bits (40:37) of each instruction are the major opcode. Opcode assignments for each instruction type are given in the “IA-64 application developers guide” that has been published by Intel. A basic IA-64 instruction has the following syntax: [qp] mnemonic [.comp] dest=srcs ,where, “qp” specifies a qualifying predicate register. When the value of the register is true, the instruction executes, its results are committed, and any exceptions that occur are handled as usual. At a false value nothing is committed and no exceptions are handled. “mnemonic” uniquely identifies an IA-64 instruction by name. “comp” specifies one or more instruction completers, which indicate optional variations on a base instruction mnemonic. “dest” represents the destination operand(s) and “srcs” represent source operands. Most IA-64 instructions have at least two input source operands. Instruction execution consists of four phases: Reading the instruction from memory (fetch), Read architectural state, if necessary (read), Perform the specified operation (execute), and finally Update architectural state, if necessary (update). If the instructions in an “instruction group”, defined in the previous section, meet all the resource dependency requirements then the behavior of a program will be as though each individual instruction is sequenced through these phases in the order listed afore. The instruction sequencing rules, given below, prescribe the order of a phase of a given instruction relative to any phase of a previous instruction. • Since there is no a priori relationship between the “fetch” of an instruction and the “read”, “execute”, and “update” of any previous instruction, the “sync.i” and “srlz.i” (synchronize and serialize) instructions can used to enforce a sequential relationship. • Every instruction will behave as if its “read” occurred after the “update” of memory and ALAT state of all instructions from the previous instruction group and all instructions within the same instruction group. Moreover, within an instruction group, every instruction will behave as though its read of the register state occurred before the update of the register state by any instruction; prior or later. This eliminates WAR data dependencies. • All instructions have unit latency and instructions on opposing sides of a “stop” are separated by at least one unit of latency. Apparently, the “Itanium” hardware puts the instruction “bundle” through a 10-stage inorder hardware pipeline. I could not locate much information on the pipeline except in an old presentation by Mr. Harsh Sarangpani, Principal Architect of the IA-64 microarchitecture. Intel has not yet released any technical documents on the working of the pipeline and hence I shall not be taking about it further. Suffice it to note that the front-end pre-fetch and fetch stages of the pipeline will allow the pre-fetching of upto 8 bundles (storing them in a decoupled buffer upon fetch). This buffer allows the front-end to fetch even when the backend is stalled. The buffer also hides instruction cache misses and branch bubbles thereby augmenting performance. The issue logic, a hardware implementation, incorporated into the pipeline issues the instruction to the CPU. This piece of hardware is designed in a manner that will promote binary compatibility across current and future IA-64 implementations. This issue logic makes an IA-64 processor more complex than a pure VLIW design, but the ability to have binary compatible processors is well worth the effort of incorporating this extra logic. The issue logic in the IA-64 is much less complicated than out-of-order superscalar processors, however I could not find documentation on supported addressing modes. I assume that IA-64 will be supporting the same addressing modes as its IA-32 predecessors (extrapolated, however, for handling the Very Large Memory model). These addressing modes, as a refresher, are absolute, register indirect, based, indexed, based indexed with displacement, based with scaled index, and based with scaled index and displacement.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An Overview of the Intel® IA-64 Compiler

The IA-64 architecture is designed with a unique combination of rich features so that it overcomes the limitations of traditional architectures and provides performance scalability for the future. The IA-64 features expose new opportunities for the compiler to optimize applications. We have incorporated into the Intel IA-64 compiler the key technology necessary to exploit these new optimization...

متن کامل

Porting Linux to IA-64

The IA-64 architecture, co-developed by HP and Intel, is going to reach market some time next year with Merced as its first implementation. Major industry players have endorsed this new architecture and technical details are gradually becoming publicly available. However, the complete architecture will not be fully disclosed until machines become available. To provide for early availability of ...

متن کامل

The Making of Linux / ia64

The IA-64 architecture, co-developed by HP and Intel, is going to reach market mid-2000 with Merced as its first implementation. Major industry players have endorsed this new architecture and technical details are gradually becoming publicly available. However, the complete architecture will not be fully disclosed until machines become available. To provide for early availability of Linux on IA...

متن کامل

64 - bit CPUs : UltraSPARC - III vs . Intel IA - 64

This communication sets an evaluation framework and performs a comparative analysis of the UltraSPARC-III and the IA-64 processor architectures. It starts with an overview of a pure RISC architecture and the new EPIC technology. It then presents the unique combination of innovative features that both manufacturers claim explicit parallelism, predication and speculative loading – and the goals f...

متن کامل

Ia-64 Code Generation Electrical and Computer Engineering Biographical Sketch 2 Prior Work 8 3 the Ia-64 Processor Architecture 17

Vikram Rao. IA-64 code generation. (Under the direction of Dr. Tom Conte). This work presents an approach to code generation for a new 64-bit Explicitly Parallel Instruction Computing (EPIC) architecture from Intel, called IA-64. The major contribution of this work is the design of a machine independent optimizer, munger, that transforms code generated originally for a Very Long Instruction Wor...

متن کامل

Ia-64 Code Generation Electrical and Computer Engineering Biographical Sketch

Rao, Vikram. IA-64 code generation. (Under the direction of Dr. Tom Conte). This work presents an approach to code generation for a new 64-bit Explicitly Parallel Instruction Computing (EPIC) architecture from Intel, called IA-64. The major contribution of this work is the design of a machine independent optimizer, munger, that transforms code generated originally for a Very Long Instruction Wo...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000